refactor the total norm computation in grad clipping in APS #3243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

jialun-zhang wants to merge 1 commit into meta-pytorch:main from jialun-zhang:export-D79128843

jialun-zhang commented Jul 29, 2025

Summary: Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

meta-cla bot added the CLA Signed label

Contributor

facebook-github-bot commented Jul 29, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

facebook-github-bot added the fb-exported label

jialun-zhang force-pushed the export-D79128843 branch from 6d8d83c to f57bf43 Compare

July 30, 2025 00:22

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

f57bf43

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

43738ef

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from f57bf43 to 43738ef Compare

July 30, 2025 00:53

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

4f86531

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 43738ef to 4f86531 Compare

July 30, 2025 23:51

Contributor

facebook-github-bot commented Jul 30, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

f9be4b0

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 4f86531 to f9be4b0 Compare

July 30, 2025 23:54

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

48c4acb

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from f9be4b0 to 48c4acb Compare

July 31, 2025 01:21

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

1305e2f

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 48c4acb to 1305e2f Compare

July 31, 2025 01:24

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

5203bf7

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the fsdp and ddp params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 1305e2f to 5203bf7 Compare

July 31, 2025 08:38

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

29f3764

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch 2 times, most recently from 29f3764 to 5199ed0 Compare

July 31, 2025 18:37

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

5199ed0

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 5199ed0 to d35993b Compare

July 31, 2025 18:39

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

d35993b

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Jul 31, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 969fda9 to d7adb15 Compare

August 5, 2025 05:49

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

d7adb15

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

8fbd6ad

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from d7adb15 to 8fbd6ad Compare

August 5, 2025 07:24

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

edb808e

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 8fbd6ad to edb808e Compare

August 5, 2025 07:27

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

39608e9

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from edb808e to 39608e9 Compare

August 5, 2025 07:27

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

1 similar comment

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 39608e9 to 7e33149 Compare

August 5, 2025 07:30

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

7e33149

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

eac06dd

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 7e33149 to eac06dd Compare

August 5, 2025 19:16

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from eac06dd to bb1e961 Compare

August 5, 2025 22:01

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

bb1e961

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Aug 5, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from bb1e961 to dc37e8f Compare

August 6, 2025 07:19

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

dc37e8f

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Aug 6, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang pushed a commit to jialun-zhang/torchrec that referenced this pull request


          refactor the total norm computation in grad clipping in APS (meta-pyt…

8c94d2d

…orch#3243)

Summary:

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from dc37e8f to 8c94d2d Compare

August 6, 2025 07:21


          refactor the total norm computation in grad clipping in APS (meta-pyt…

41a0c08

…orch#3243)

Summary:
Pull Request resolved: meta-pytorch#3243

Refactored the previous code for applying gradient clipping across ddp and fsdp parameter. Added a new funciton _compute_total_norm() that takes in the replicated and sharded params provided in the gradientclippingOpitmizer class and computes the total gradient norm of the given parameter.

Differential Revision: D79128843

Contributor

facebook-github-bot commented Aug 6, 2025

This pull request was exported from Phabricator. Differential Revision: D79128843

jialun-zhang force-pushed the export-D79128843 branch from 8c94d2d to 41a0c08 Compare

August 6, 2025 07:27

facebook-github-bot closed this in

eac9a23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported